Goto

Collaborating Authors

 control system


Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach Rory Young Nicolas Pugeault School of Computing Science University of Glasgow

Neural Information Processing Systems

Deep reinforcement learning agents achieve state-of-the-art performance in a wide range of simulated control tasks. However, successful applications to real-world problems remain limited. One reason for this dichotomy is because the learnt policies are not robust to observation noise or adversarial attacks. In this paper, we investigate the robustness of deep RL policies to a single small state perturbation in deterministic continuous control tasks.



1 2 " Xt Ut # 0 " Hxxt Hxut Huxt Huut

Neural Information Processing Systems

Based onLemma 5.1anditsproof, weknownthatthePMP oftheauxiliary control system, (S.2), is exactly the differential PMP equations (13). Thus below, we only look at the differential PMP equationsin(S.2). In the system identification experiment, we collect a total number of five trajectories from systems (in Table 2) with dynamics known, wherein different trajectoriesξo = {xo0:T,u0:T 1}havedifferent initial conditionsx0 andhorizonsT (T ranges from10to20),with randominputsu0:T 1 drawnfromuniformdistribution. In fact, throughout the entire learning process, PDP always guarantees that the policyconstraint isperfectly respected (as the forward pass strictly follows the policy). Please seeAppendix Fig. S4for validation.





Reinforcement Learning for Control Systems with Time Delays: A Comprehensive Survey

Neto, Armando Alves

arXiv.org Machine Learning

In the last decade, Reinforcement Learning (RL) has achieved remarkable success in the control and decision-making of complex dynamical systems. However, most RL algorithms rely on the Markov Decision Process assumption, which is violated in practical cyber-physical systems affected by sensing delays, actuation latencies, and communication constraints. Such time delays introduce memory effects that can significantly degrade performance and compromise stability, particularly in networked and multi-agent environments. This paper presents a comprehensive survey of RL methods designed to address time delays in control systems. We first formalize the main classes of delays and analyze their impact on the Markov property. We then systematically categorize existing approaches into five major families: state augmentation and history-based representations, recurrent policies with learned memory, predictor-based and model-aware methods, robust and domain-randomized training strategies, and safe RL frameworks with explicit constraint handling. For each family, we discuss underlying principles, practical advantages, and inherent limitations. A comparative analysis highlights key trade-offs among these approaches and provides practical guidelines for selecting suitable methods under different delay characteristics and safety requirements. Finally, we identify open challenges and promising research directions, including stability certification, large-delay learning, multi-agent communication co-design, and standardized benchmarking. This survey aims to serve as a unified reference for researchers and practitioners developing reliable RL-based controllers in delay-affected cyber-physical systems.


Pontryagin Differentiable Programming: An End-to-End Learning and Control Framework

Neural Information Processing Systems

This paper develops a Pontryagin differentiable programming (PDP) methodology, which establishes a unified framework to solve a broad class of learning and control tasks. The PDP distinguishes from existing methods by two novel techniques: first, we differentiate through Pontryagin's Maximum Principle, and this allows to obtain the analytical derivative of a trajectory with respect to tunable parameters within an optimal control system, enabling end-to-end learning of dynamics, policies, or/and control objective functions; and second, we propose an auxiliary control system in the backward pass of the PDP framework, and the output of this auxiliary control system is the analytical derivative of the original system's trajectory with respect to the parameters, which can be iteratively solved using standard control tools. We investigate three learning modes of the PDP: inverse reinforcement learning, system identification, and control/planning. We demonstrate the capability of the PDP in each learning mode on different high-dimensional systems, including multilink robot arm, 6-DoF maneuvering UAV, and 6-DoF rocket powered landing.


Smart Traffic Signals: Comparing MARL and Fixed-Time Strategies

Mahato, Saahil

arXiv.org Artificial Intelligence

Urban traffic congestion, particularly at intersections, significantly affects travel time, fuel consumption, and emissions. Traditional fixed-time signal control systems often lack the adaptability to effectively manage dynamic traffic patterns. This study explores the application of multi-agent reinforcement learning (MARL) to optimize traffic signal coordination across multiple intersections within a simulated environment. A simulation was developed to model a network of interconnected intersections with randomly generated vehicle flows to reflect realistic traffic variability. A decentralized MARL controller was implemented in which each traffic signal operates as an autonomous agent, making decisions based on local observations and information from neighboring agents. Performance was evaluated against a baseline fixed-time controller using metrics such as average vehicle wait time and overall throughput. The MARL approach demonstrated statistically significant improvements, including reduced average waiting times and improved throughput. These findings suggest that MARL-based dynamic control strategies hold substantial promise to improve urban traffic management efficiency. More research is recommended to address the challenges of scalability and real-world implementation.


BUDD-e: an autonomous robotic guide for visually impaired users

Li, Jinyang, Farina, Marcello, Mozzarelli, Luca, Cattaneo, Luca, Rattamasanaprapai, Panita, Tagarelli, Eleonora A., Corno, Matteo, Perego, Paolo, Andreoni, Giuseppe, Lettieri, Emanuele

arXiv.org Artificial Intelligence

Abstract--This paper describes the design and the realization of a prototype of the novel guide robot BUDD-e for visually impaired users. The robot has been tested in a real scenario with the help of visually disabled volunteers at ASST Grande Ospedale Metropolitano Niguarda, in Milan. The results of the experimental campaign are throughly described in the paper, displaying its remarkable performance and user-acceptance. Index T erms--Assistive technologies, autonomous navigation, autonomous robotics, autonomous guide for visually impaired users. According to [1], in 2020 the number of totally blind people was estimated to about 49.1 million (about 0.6 % of the world population), while people with severe and moderate vision problems were estimated to 33.6 million (about 0.4 % of the world population) and 221.4 million (about 2.8 % of the world population), respectively. Furthermore, due to an aging population, it is estimated that the rate of people affected by vision problems will continue to increase in the coming decades [2]. People with visual impairments currently face a number of issues when it comes to visiting public spaces and using services. It is very difficult for blind and partially sighted persons to access shared places (areas where cars, buses, pedestrians, and cyclists share the same space) alone since important inclusive environmental aids are frequently removed in communal areas. As discussed in [3], navigating inside a shopping mall for a blind or low-vision person can be tiring and stressful. Shopping in groceries is practically impossible and shopping centers often don't have enough staff on duty to offer help. Emanuele Lettieri is with the Department of Management, Economics and Industrial Engineering, Politecnico di Milano, Via Lambruschini 4, Milan, Italy (e-mail: emanuele.lettieri@polimi.it).